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On-line direct data driven controller design approach with automatic 
update for some of the tuning parameters 

M. Tanaskovic, L. Fagiano, C. Novara and M. Morari * 


1 Introduction 

This manuscript contains technical details of recent results developed by the authors on the algorithm for direct design of 
controllers for nonlinear systems from data that has the ability to to automatically modify some of the tuning parameters 
in order to increase control performance over time. 


2 Problem formulation 

We consider a discrete, time invariant, nonlinear system with one input and n x states that can be represented by the 
following state equation: 

x t +i =g(x t ,u t ) +e t , (1) 

where t £ Z is the discrete time step, it* £ R is the control input, xt € K" x is the vector of states and et £ l n " is the 
vector of disturbance signals that accounts for the contribution of both the measurement and process disturbances. We 
make the following two assumptions about the disturbance signal e t and the nonlinear function g: 

Assumption 1 The disturbance et is bounded in magnitude: 

e* £ B e = {et : ||e t || < e,Vt £ Z}, (2) 


for some e > 0. 


Assumption 2 The function g is Lipschitz continuous with respect to u, i.e. g(x , ■) £ U) for any x £ X, where 

[/Cl and X C R” x are compact (possibly very large) sets, and 


Hi a V) 


S' : IIfl'(O)11 <oo, 

llff( u i)-5( u 2)||<7sl|wr-U2||,Vui,u 2 £F 


( 3 ) 


The notation 11 • 11 stands for a suitable vector norm chosen by the user (typically 2- or oo-norm) and the presented results 
hold for any particular norm selection. 

It is assumed that the nonlinear function g that describes the dynamics of the system © is unknown, but that a set V^ 
of N noise corrupted input and state measurements generated by the system © is available: 

£>7V = {ut^t}fl_N 1 Wt = (x t ,x t+ r). (4) 

We make the following assumption on the training data ©. 

Assumption 3 The available measurements tire such that Ut £ U and Wt £ X x X, Vt = — N ,..., —1. 

In this note we consider the notion of finite gain stability. 
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Definition 2.1 (Finite gain stability) A nonlinear system (possibly time varying) with input u t G R, state x t G M” x and 
disturbance et G B e is finite gain stable if there exist finite and nonnegative constants Ai, A 2 and (3 such that: 


< Ai||u|loo + A 2 1|e||oo + /?, Vu G Id,We G B e , 


(5) 


where x = (xi,X2, ■ ■ ■)> u = (ui,U2, ■ • ■)> e = ( e ij e 2, • ■ •), ||x||oo = sup ||:Efc|| and U and B e are the domains of the 

k 

input and disturbance signals, respectively. 

Based on this definition, we introduce the notion of 7 stabilizability. 

Definition 2.2 The system © is 7-stabilizable if there exists a 7 < 00 and a function f G J r (7,R 2nx ) such that the 
closed-loop system with input rt G B r C X and disturbance et G B e : 


x t+ i = g (x t , f(x t ,r t+ 1 )) + e t 


(6) 


is finite gain stable. 

In Definition 12.21 the reference signal is assumed to belong to a compact B- C X, i.e. the reference is bounded in 
norm by the scalar r and it is never outside the set where the state trajectory shall be confined. 

Assumption 4 The system © is y-stabilizable for some 7 < 00. 

We can finally state the problem that we address. 

Problem 2.1 Use the batch of data T)^, collected up to t = 0, to design a feedback controller whose aim is to track a 
desired reference signal r t G B T for t > 0. Once the controller is in operation, carry out on-line refinement of the design 
by exploiting the incoming input and state measurements, while keeping the closed loop system finite gain stable. 


3 On-line direct control design method 

We approach Problem 12. 1 1 from the point of view of data-driven, direct dynamic inversion techniques. In this context, 
we assume the existence of an “optimal” (in a sense that will be shortly specified) inverse of the system’s dynamics © 
among the functions that, if used as controller, stabilize the closed-loop system. Then, we build from the available prior 
knowledge and data a set of functions that is guaranteed to contain the optimal inverse, and we exploit such a set to derive 
an approximated inverse, which we use as feedback controller. This approach involves several preliminary ingredients, 
explained in the following sub-sections. 

3.1 Optimal inverse and controller structure 

Following the definitions and notation introduced in ©, for a given control function / we define the point-wise inversion 
error as: 

IE(f, r, x, e) = ||r - g(x,f(r,x)) - e||, (7) 

and the global inversion error as: 

GIE(f)= L \\IE(f,;;-)\\, (8) 

where l\\ • | in © is a suitable function norm (e.g. Loo) evaluated on A x U T x B f . Based on Assumption© there exist 
a set S containing all functions / that stabilize the closed loop system. Then, we define the optimal inverse controller 
function f* as: 

f* = arg m in GIE(f), (9) 

s n Fx x x 

where (Fxxx denotes the set of all Lipschitz continuous functions on X x X. We denote the Lipschitz constant of f* 
with 7 *, and the related constants Ai, A 2 and 3, obtained if the controller /* were used in closed-loop (see ©), by A*, A 2 
and 3*. 

Considering the measured data available up to a generic time t, we can write the control input as: 

«t = /*(wt) + dt, (10) 
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where d t is a signal accounting for the unmeasured noise and disturbances and possible inversion errors. From Assump¬ 
tions [T] and [2] it holds that as long as the state and input trajectories evolve in the sets X and U, respectively, the scalar dt 
has to be bounded, i.e. dt £ B$ C R with 6 being a positive constant. Then, following a set membership identification 
approach (see e.g. m ED, we consider the set of feasible inverse functions at time step t ( FIFSt ), i.e. the set of all 
functions / £ Fxxx that are consistent with the available data and prior information: 

FIFSt = fl H F (11) 

j=-N,...,t -1 

where: 

H 3 = {/ e Txxx ■■ |Uj - f( Uj ) I < 6}. (12) 

The inequality in (fl2l > stems from the observation that the measured input ut and the value of function f* evaluated at the 
corresponding uj t can not be larger than the bound on the amplitude of the signal dt- 

Under Assumptions QJj?] if Ut £ U and x t £ Xx X , Vf > —N, then the optimal inverse /* belongs to FIFSt, 
i.e. f* £ FIFSt for all t. In set membership identification, an estimate f ~ f* belonging to the set FIFSt enjoys 
a guaranteed worst-case approximation error not larger than twice the minimal that can be achieved (see e.g. 0 for 
details). Motivated by this accuracy guarantee, we update the approximation of the optimal inverse controller /*, f t on¬ 
line in order to approach the set FIFSt- First, in order to have a tractable computational problem, we parameterize the 
controller f t with a finite sum of kernel functions: 

/ t (w) = af K(lo, W t ), 

where at £ M Lt is the vector of weights, and K(u>, Wt) = [k(w, wi), ..., k(w, wr t )] T is a vector of kernel functions 
n{-,Cji) : R 2 " - * —>• R, i = 1 ,L t belonging to a dictionary that is uniquely determined by the L t kernel function 
centers Wt = {wi,..., u >/,,}. Then, at each time t we update the set IT) , which determines the kernel function dictionary, 
and we also recursively update the weights at exploiting the knowledge of FIFSt dTTb . with an approach inspired by the 
projection-based learning scheme presented in (4) in the context of signal processing. Moreover, in order to achieve finite 
gain stability of the closed-loop system, we exploit the information that /* £ FIFSq to derive a robust constraint on the 
vector of weights at, which we impose in the on-line procedure. 

In the following, we provide the details of these steps and at the end we summarize the overall design method. 


3.2 Robust inequality to enforce closed loop stability 

We require the approximated inverse, f t , to satisfy the following inequality at each time step / > 0: 

|/t(w t + ) -/>*+)| < 7 a|M| + 0 -, 

V/* e F<a*,X x X)<1FIFS 0 , Vt > 0, 


where wj 1 " = [xt, ft+i] 2 and 7 a, cr 6 l,7A,<r > 0, are design parameters. Guidelines on how these parameters should 
be selected in order to guarantee finite gain stability of the closed-loop are given in Section [4] The idea behind (IT3l > is to 
limit the discrepancy between the input computed by the approximate inverse f t at time step t, i.e. u t = and the 

one given by the optimal inverse /* to a sufficiently small value, which depends linearly on the norm of the current state. 
However, since the optimal inverse /* is not known, we require the inequality (IT3l > to be satisfied robustly for all functions 
in FIFSq that have the Lipschitz constant equal to 7*. As mentioned above, such a function set is in fact guaranteed to 
contain /* under our working assumptions. 

To translate the inequality (Il3l) into a computationally tractable constraint on the parameters at, we exploit the infor¬ 
mation that /* £ _F( 7 *, X x A') HFIFSq to compute tight upper and lower bounds on ) using the following result 

from 0. 

Theorem 3.1 (Theorem 2 in E3) Given a nonlinear function f* £ J-(j*,X x A') fl FIFSq, the following inequality 
holds: 

/M < /*M < 7M, 

where: _ 

/M = min (u k + <5 + 7*||w - w fe ||) 

k :l (14) 

/M = max (uk — <5 — 7*||w — Wfc||). 

— k=— N. ., — 1 
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Exploiting Theorem 13. II the robust constraint ( ITTt i can be satished by enforcing the following two inequalities on the 
vector of weights at'. 

-TAlktll -cr + 7(w t + ) < afK(uit, w t ) 
at K (w t + : Wt) < 7A| \x t II + cr + /( W + ) 

Note that the value of 7* that is required in order to enforce the constraints in ( IT?] ) needs to be estimated from the available 
measurement data (see also Section []}. We describe next the approach to update the dictionary of kernel functions and 
the vector of weights at- 


3.3 Updating the dictionary of kernel functions 

The data generated by any Lipschitz continuous nonlinear function evaluated at a finite number of points can be well 
approximated by a dictionary of kernel functions centered at the same points. In our on-line controller design, we let 
the dictionary grow and incorporate new kernel functions as new input and state measurements are collected. However, 
adding a new function to the dictionary at each time step would lead to an unlimited growth of the dictionary size L t over 
time. Moreover, this would result in a dictionary that is not sparse, i.e. with many functions that are similar (centered at 
points close to each other), and with possible over-fitting of the measurement data. To avoid these problems, we choose 
to add a new function only if it is sufficiently different from those already contained in the dictionary. As indicator of 
similarity, we use the so-called coherence factor (see e.g. Q for more details): 

H(u>,Wt) = max |ac(uz, u?i)|- (16) 

Note that /x(w, Wt) £ (0,1], and that /x(w, Wt) = 1 if and only if u £ W t . Hence, the larger the coherence value in (IT6l ). 
the more similar is the kernel function centered at u> to some function already in the dictionary. In our design technique, 
we set a threshold ~p £ (0,1) and we add a particular data point u> to the set of function centers Wt only if n(u>, Wt) < Jl. 
This approach guarantees that the size of the dictionary will remain bounded over time (see e.g. ED- 


3.4 Updating the vector of weights 

As a preliminary step to the recursive update of the weights at note that, as discussed above, the size of the dictionary can 
expand from time step t — 1 to time step t and therefore in general it will hold that at -1 £ K Lt_1 and at £ R Lt with 
L t -1 < L t . Therefore, in order to properly define the updating algorithm at time t, we consider the vector <if _ t £ : 

at-i = [^-i,0 L ^0} T , (17) 

Lt—Lt-i 


obtained by initializing the weights corresponding to the kernel functions that are added to the dictionary to zero. 

To introduce the updating of the vector a t £ , we note that each pair (uj,u>j), j = — N,..., t — 1 defines, together 

with the dictionary of kernel functions at time step t, the following set: 

Sjt = {a £ : | a T K(iOj,W t ) — uj\ < <5}, (18) 

which is a strip (hyperslab) in M. Lt . If at £ Sjt, then the corresponding function f t in (IT3l) belongs to the set Hj defined 
in (fl2l) . We further define the projection of a point in R Li onto the strip Sjt as: 


p jt{a) = min ||a — a|| 2 - 

Cl^L&jt 


(19) 


Note that calculating the projection ( IT9l ) amounts to solving a very simple linear program, whose solution can be explicitly 
derived (see e.g. (6|). Therefore, calculating the projection of any point in onto a measurement strip as in ( | 1 9[ i can 
be done computationally very efficiently. Finally, we consider the hyperslab defined by the stability constraint (031 ): 


S+ = 


a £ R L ‘: 


a T K(ut,W t ) > -y A ||a; t ||-cr+/(u; t + )l 
a T K(u]+,W t )<iA\\xt\\+a+l(uf) J ’ 


(20) 


and we denote the corresponding projection operator with P f + (■). 
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From the definitions of the hyperslabs S. )t and 5 t + in ( ITSl ) and < 12()b it follows that if 

« t e5+n( n s * 

\j=-N,...,t -1 

then the corresponding function f t belongs to the set FIFSt and satisfies the stabilizing constraint ( 1 1 31 . However, to find 
a point that belongs to the intersection of all Sjt, j = —N,... ,t — 1 at each time step is computationally challenging. 
Therefore, we exploit the idea at the basis of projection learning algorithms, that by repeatedly applying the projection 
operators to a point, the result will eventually fall in the intersection of the considered hyperslabs. In particular, we update 
the vector of weights at in two steps: first, following the idea of E), we calculate a convex combination of its projections 
onto the hyperslabs defined by a finite number q > 1 of the latest measurements; then, we project the obtained point 
onto the hyperslab S+ in order to ensure the satisfaction of the stabilizing constraint (fl3l >. To be more specific, let the 
set of indexes J t = {max{— N, t — g},..., t — 1} contain the time instants of the last q state and input measurements, 
and let I t = {j £ J t : ^ .Sjt} be the subset of indexes such that the weighting vector af_ ± does not belong to the 

corresponding hyperslabs. Then, we compute our update of the weighting vector at from af_ t as: 



at = P t + 


£ 

ieh 




1 ) 



( 21 ) 


where card(It) denotes the number of elements in I t . This update can be computed very efficiently with the explicit 
formulas for vector projections and eventually by parallelizing the projection operations. 


3.5 Summary of the proposed design algorithm 

The described procedures to update the dictionary of kernel functions and the weights at form our on-line scheme to 
compute the feedback controller ft, summarized in AlgorithmQ] 


Algorithm 1 Feedback control algorithm based on the on-line direct control design scheme 

1) Collect the state measurement x t . If t < 0, set = [xt,Xt+ i] T , otherwise set ui^f = [xt , tt+i] T ■ 

2) Update the dictionary Wt starting from Wt-i and adding wj 1 " if , Wt- i) < Jt and u>t -i if fi(tJt-i, Wt- i) < Jt. 
Form the vector af _ 1 according to ( 1 1 7| i. 

3) Calculate at according to CD- 

4) Iff > 0, calculate the input Ut = aj 1 K (tcj", Wt) and apply it to the plant. 

5) Set t = t + 1 and go to 1). 


For t > 0 , such an algorithm is both a controller and a design algorithm, while for t < 0 it only acts as a design 
algorithm. 


4 Algorithm tuning 


In order to implement Algorithm [D several tuning parameters need to be selected. These are the noise bound S and 
the Lipshitz constant 7* which are required for calculating the projections on the hyperslabs Sjt and Sf. In addition 
parameters 7 a and a in (ITD need to be selected. Careful selection of these parameters guarantees finite gain stability of 
the closed loop. Namely the parameters 7 a and a should be selected such that: 


and 


7A G 0 


’ IgK 


( 22 ) 



(23) 
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where 


(24) 


D 0 = sup (/(w) - /(w)) , 

bj^B-urr 

with 

Bxr = {to € K” x x R : ui = (x, r), Vx € r € Br} ■ (25) 


and with x given as 

_ . A*r + 79*^2^ 4 ” ^2^ 4 ” ft * 

1 ^7gA^7A 


(26) 


In order to verify whether (l22l) and (l23l) hold, the values of e, <5, j g , 7 *, A), AJ and 3* should be known. The values 
of A*, AJ and /3* can not be estimated based on the available data and they have to be guessed. Since these parameters 
are related to the performance of the optimal inverse controller f* that should typically result in small tracking error (i.e. 
the state of the corresponding closed loop system should be close to the desired reference signal), a reasonable guess for 
AJ and AJ is a value slightly greater than 1 and for /?* a value close to 0. The Lipschitz constants 7 g and 7* as well as the 
disturbance bounds e and 5 can be estimated from the available training data (see e.g. (TJ). In order to be able to use the 
parameter estimates that might be subject to estimation errors while still retaining the stability guarantees, the obtained 
estimates can be inflated by positive constants that reflect the level of estimation uncertainty, i.e. the parameters e, 7 g 
and 7* can be selected as: e = e_i + c e , 5 = 5 -1 + eg, 7 S = 7 ff ,-i + c lg and 7 * = 7 * 1 + c 7 *, where e_ 1 , <5_i, 7 ff ,-i 
and 7 !]^ denote the parameter estimates obtained by applying the algorithms described in JT| to the training data T>n and 
c f , eg, c 7fl and c 7 * are positive constants that should be selected by the control designer and that should reflect his feeling 
on the size of the possible estimation error. Hence, in order to ensure the satisfaction of ( l22l > and (1231 ). parameters 7 a and 
a can be selected such that 7 a G ( 0 , -p- -r-rnr ) and 

V n's,-i+c 7 s )A 2 j 

cr > \ sup (7 C M - / C M), (27) 

" UJ^B-xr 


where 

/c( w ) = fc _ min ^ (u fe + (5_i + c 5 + (7l 1 + c 7 .)||w-Wfc||^ 

/ (w)= max (u fc -5 _i-C5-(7l 1 + c 7 .)||w-Wfc||). 

— c k=—N,..., — l \ / 


(28) 


However, some of these parameters can also be updated on-line, which should increase their accuracy as new input and 
state measurements are collected and hence increase the overall performance of the controller. In the following section 
we describe how some of the tuning parameters can be updated on-line and we prove the finite gain stability of the closed 
loop system in this case. 


4.1 On-line adaptation of some of the tuning parameters 

Note that the value of e is required for selecting the tuning parameter a. Recalculating the value of cr that satisfies (1231) 
over time would be computationally demanding and therefore we do not consider updating of the parameter a and the 
noise bound estimate e over time. Hence the parameter e can be selected by properly inflating the estimate e_i obtained 
from the initially available training data and a can be selected such that it satisfies (1271) . On the other hand selecting 
7 A that satisfies (l22t based on 7 g is very easy and therefore we will consider its modification over time. Moreover, the 
updating of parameters 6 and 7* that influence the projections done under AlgorithmQ]is also considered. 

To this end, we consider a generic nonlinear function /' : —>- M r ' z with nj inputs and n z outputs that is Lipschitz 

continuous with the constant 7 and whose output is corrupted by disturbances ot as: 


zt = f'(tt) + 0 t, (29) 

where o t G R"*, |o* | < e, Vf. We introduce two on-line algorithms for updating the estimates of the noise bound and the 
Lipschitz constant over time, that we denote by i t and 7 t at time step t respectively. 

Algorithms [2] and [3] can be run consecutively in order to update the estimates of e and 7 at each time step. These 
two algorithms can be seen as on-line versions of the estimation methods proposed in ffl and (7) ■ In order to limit the 
memory requirements of the proposed algorithms, we introduce a memory horizon N which denotes the maximal number 
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Algorithm 2 on-line estimation of the noise bound 

1) Chose a “small“ p > 0. For example p = 0.01 max 11£, — |, initialize i_pf to 0 and set t = —N + 1. 

i,j=—N— 1 

2) Find the set of indexes: J t = { k £ [max{-N,t-N},...,t] : ||& -£fc|| < p) 

3) If J t = 0 set £ z = 0, otherwise set s z = b max \\z t — Zi\\. 

idzJt 

4) Calculate it = max{et_i, £ z }. 

5) Set t = t + 1 and go to 2). 


Algorithm 3 on-line estimation of the Lipschitz constant 

1) Initialize 7 _ w , 7 l u " ent and A c _™ ent to 0 and set t =-N + 1. 

2) Calculate A kt = ||£ fc - £ t \\,k = ma x{-N,t- IV},... ,t - 1. 

3) For k = max {—A 7 , t — N},... , t — 1 and A*,* 7 ^ 0 calculate: 


7fci = 


l^t —gfcll—2et 


if Ikt - Zfcll > 2fA 

if otherwise 


(30) 


If Aa = 0 set 7 ^ t = 0. 


4) For j = max{— N, t — — 1 and i = max{—TV, t — IV},..., j — 1 calculate: 7 F = 7 F 1 — 2( - Et A £t 1 \ In 

addition, calculate 7 t current = 7 t c ™ j ent - 2 ( ^~S,7 l) ■ If A, ;i = 0 or A™T' = 0, set 7 F = t }" 1 and 7 t current = 7 t c ^f nt . 

5) Calculate: 

7 t = max _ { 7 F} (31) 

j = max{— N,t — N},... ,t 
i = max{— N, t — IV},..., j — 1 

7 1 = max( 7 t, 7 f current }. (32) 

If j t = % set Aj urrent = A pq , where 7 * = 7 ^. Otherwise set A“ mnt = A™™ 111 . Set 7 t current = % 

6 ) Set t = t + 1 and go to 2). 
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of measurement points that need to be stored in memory during the execution of the algorithms. By setting N = N, the 
functionality of the proposed algorithms becomes equivalent to the functionality of their off-line counterparts. 

Algorithm [2] can be used in order to estimate the noise bound 6. We denote this estimate at time step t by 6 f . The 
estimate St and the Algorithm[3]can be used in order to update the estimate of the Lipschitz constant 7 *, that we denote 
by 7 t *. This can be done by setting the function f in (l29l > to /, z t to u t and to w t . The Lipschitz constant of function g 
with respect to the input u , 7 g , can be estimated by considering a reformulation of the state evolution equation (|7]i given 
by: 

x t+ i = g'(u t ) + d t , (33) 

where g' = g(x*. ut) is the unknown function with the Lipschitz constant 7 g and x* are given by: 


x 


* 


Cg{x) 


arg max C g (x) 
xeBx 

||g(a;,ui) - g(x,u 2 )\\ 

max -:-:-, 

u-L^eu \u\ — u 2 \ 


(34) 


and 

= g(x t ,u t ) - g(x*,u t ) + e t (35) 

is an unknown disturbance signal. From the Assumptions [T| and [2] it follows that if is bounded if lit G U and x t G 
Xyt > —N, i.e. i9 t G B(,\/t > —N, where = {d G : ||i?||oo < C}- Therefore, Algorithm [2] can be used to 

recursively update the bound of the disturbance and then Algorithm [3] can be used in order to estimate the Lipschitz 
constant 7 g . In this case the function f in ( |29| > should be set to g', Zt should be set to x t and f to u t . We will denote the 
estimate of the Lipschitz constant 7 g at time step t by 7 gt . 

In order to state the conditions under which the Algorithms|2]and[3]can be used together with the Algorithm|T|in order 
to update some of its tuning parameters on-line, while preserving the finite gain stability of the closed loop system, we 
introduce the following assumption on the initially available data set TV. 


Assumption 5 Initially collected training data T>n are such that as N —y 00, for any pair (x, d ) G Bf x Bg and any A G 
M, A > 0 there exist a finite N\ G N, N\ < 00 such that a pair ( y t ,dt),t G [— N\, — 1 ] satisfying \ \(x,d) — (xt,dt)\\ < A 
exists. In addition, for any pair (u, d) G U X Bg and any 9 G 1, 9 > 0 there exists a finite Ng G N, Ng < 00 such that 
the pair G [—TV's,—!] satisfying ||(u, 1?) — (ut,i?t)|| < 9 exists. 


This assumption ensures that the initial training data set is generated in such a way that the plant state x t and the distur¬ 
bance signal dt, as well as the input signal u t and the disturbance if explore their domains well, i.e. the initially collected 
data is informative enough. Based on this assumption, we state the following Lemma on the properties of estimates St, 7 1 * 
and f g ,t obtained by using the Algorithms |2] and 0 on the training data T>n, which is a direct consequence of the fact that 
the functionality of the introduced algorithms become equivalent to the functionality of off-line methods developed in f7] 
when N is set to N. 

Lemma 4.1 Let the Assumption\ 5 \hold and let N = N, with N 00. If the algorithms \ 2 \ and\ 3 \are used to estimate 5 , 
7* and 7 g , it holds that St —> 6 , —>■ 7* and j g as t —> 00. 

Therefore, if the training data would be infinitely long and sufficiently informative, then the estimates St, ft and f g j 
would converge to the corresponding true values. Based on this, we state the following Lemma that gives bounds on the 
estimation error of St, f t and f ;h t for t > 0 in the case when a finite number of training data is available. 

Lemma 4.2 Let the Assumption\ 5 \ hold. For any c^, c 7 * and c lg G R such that cs G (0,(5), c 7 » G ( 0,7 *) and Cj G 
(0,7 g ), there exists a finite N G N ,N < 00 such that by setting N > N > N it holds that 5 —cs < St < 6 , f 9i t > Jg — c 7 
and ff >7* — c 7 *, Vi > 0 when Algorithms\ 2 \and\ 3 \are used. 

Proof 1 We first note that, due to the step 3) of Algorithm^ it holds that the estimates of the noise bounds 6 and £ can only 
increase over time, i.e. (5t+i > 5t and (f+i > £ t) Vf. From the fact that St and C,t are calculated by taking the maximum 
over the noise bound evaluated for individual data points, it holds that St < 5 and In addition, from Lemma \4.1\ 

it follows that for any cs, C(Gl such that cs G (0,5), G (0, £), there exist finite Ng, Af G N, Ng, N^ < 00 such that 
by setting N > N > max{W^, N^}, it holds that <5_ 1 > 5 — eg and C_i > C — C c- Therefore if N > N > max{A( 5 , N^}, 
it has to hold that S — eg < St < S and £ — Cq < Q < C- Moreover, due to the steps 4) and 5) of Algoritlim\3\ 
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it holds that if N > N > max{A^, Nq}, then 7 ^ > 7 ^ : — . Jfif, and 7 gt > 7 Si _i — , where A c ff e fl and 

- 1 , 7 * ’ ’ “i.Tg 

^cumm d eno j e f) le va l ue of A c t urren ' obtained at time step t = — 1 when Algorithm\3\is used for estimating 7* and y g 
respectively. In addition, from Lemma \4~i\ it holds that for any c 7 * = c 7 * — and c 7 = c 7s — A n,°S„ , f/iere exist 


1 , 7 * 


G N, N' lt ,N' lg < 00 such that by setting N > N > max{iV^„, iV^} it holds that |t'1 1 — 7 *) < c 7 , and 
1 7 g; _ 1 — 7 g | < c ; . Therefore by setting N > N > N = maxjTV^, N^, } it will hold that 5 — eg < 5 t < S, 

Tg.t >7 g — Cjg and 7 1 > 7* ~ c 7 »,Vi >0. ■ 


Hence according to Lemma l4~2l if the training data set 'D y is informative and long enough, bounds on the accuracy of the 
estimates St, 7 f * and 7 9i t are guaranteed Vi. 

In analogy to the definition of the hyperslabs Sjt and 5/ in dTSl) and (120b . we define the hyperslabs Sjt and S+ that 
depend on the time varying estimates S t and 7 t * and the time varying value of the tuning parameter 7 a that we denote by 


7A,t as: 

Sjt = {a € R Lt : | a T n(u!j,Wt) - Uj\ < <5 t }, 

^ (a e R Lt : -yA,t\\xt\\-a+7 t (ut)<a T K(u+,Wt)') 

* \ a T K(ujf ,Wt)<yA,t\\xt\\+cr+f_ t (u}f) J 


(36) 

(37) 


where 

/t(w) = min (uk+S-v+cs+yWuj-ujkW) 

k= — N—1 \ / 

/ (w) = max (uk-S-v-cs-fWuj-uJkW ) , 

— 1 k— — N,...,— 1 \ / 


(38) 


and 7 = min{ 7 t *+c 7 *, 7 l 1 +c 7 *}, with <5_i and 7*-i being the estimates of <5 and 7 * obtained from the training data T>i y 
either by using Algorithms^ and [3] or the method proposed in (T) and eg £ (0, S) and c 7 » G (0,7*) are design parameters. 
Based on this, we define the projection update equation that should be used by Algorithm Q] instead of (l2lb when the 
Algorithms [2] and [3] are used to update the estimates of <5, 7 * and y g over time: 


at = P t + 



(39) 


where P t + (•) and Pjt(-) denote projection operators onto hyperslabs Sf and Sp¬ 
in addition, in order to account for the fact that the parameter 7 a can change over time, we redefine the maximal 
achievable state amplitude as 

^ 1 ^ + (7g,-l + C 7g)'^2 Cr + ^2 e + P* //1rn 

% 1 / ~ . \ \ *_ • v^'-v 

1 - (7s -1 + c 7 JA^ 7 A 

where 7 g .-i is the estimate of y g obtained from the initially available training data and “ A is a design parameter that 
should be selected such that 7 A < 7 ^-)-vtt. 

,A (Tg,-l+c 79 )A* 

Moreover, in analogy to (l22l ) . we make the following assumptions about the selection of the tuning parameter 7 a, t- 


7a ,t £ 0, min 


(7 g,t + c 7 s )-^2 


< 7a 


(41) 


In order to state the result on finite gain stability of the closed loop we make an additional assumption about the balls 
and B^r defined by x in 


Assumption 6 B% C X. Moreover, Vw £ B^VAu £ [—7a x — a, y&x + a], f*(ut) + Au £ U. 

In line with the Lernrna [4~2l for the selected design parameters eg, c 7 * and c 7 £ (0, y g ) . we denote the minimal possible 
length of the training data sequence that still guarantees satisfaction of Lemrna [4~2l bv N. We now have all the ingredients 
to state the theorem on the conditions under which the on-line application of Algorithm |T] that uses on-line update of of 
some of the related tuning parameters results in a closed loop system that is finite gain stable. 

Theorem 4.1 Let the Assumptions\7}^6\hold. If the parameters <7 and 7a, t are selected such that <0 and @ hold, if 
Sq ^ 0 and Xg £ B-%. Then for any reference signal rt £ B r , Vi > 0, it holds that Sf- 0, Vi > 0 and the closed loop 
system obtained when AIgorithm\T}with the weight update equation as in © is used is finite gain stable. 
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Proof 2 We will prove the theorem by induction. But first, we note that the closed loop system obtained by using the 
approximate controller ft can be represented as: 


x t+1 =g(x t , ft(x t ,r t+ i))+et = g(x t , f*(x t ,r t+ i))+e t +Vt, 
v t = g(x t , ft(x t ,r t+ i))-g{x t , f*(x t ,r t+ i)). 

From Assumption\4\and the definition of the optimal inverse controller f* in it holds that: 

||x||oo < A*||r||oo "I* A]>11v|loo + AJ||e||oo +/3*, (43) 

where v = ( Vi,V 2 , • ■ •)• Moreover, we note that from Assumptions\2\and\6\ it follows that: 

\\vt\\<')g\ft{xt,rt+i)-f*{xt,r t+ i)\^{x u r t+1 )& B^. (44) 

We now employ the inductive argument to show that if Sq 7 ^ 0 and Xq £ B-^, then S+ 7 ^ 0 and x t £ B w ,ft > 0. The 
condition is satisfied for t = 0 by the Theorem assumption. Let us assume, for the sake of inductive argument, that S£ 7 ^ 0 
and Xk £ B%, Vfc £ [0, t — 1]. From this assumption and the way the weighting vector at is updated in ( 139b . it follows that 
ak £ , Vfc £ [0,7 — 1]. From Assumptions\3\and\6\ it follows that ui k £ X x X and Uk £ U,\/k £ [—N,t — 1]. From 

the definition of in (137b . Assumptions\J\\2\and Theorem \3.1\ it then follows that: 

\fk{xk,r k+1 ) - f*{x k ,r k+1 )\ < 7A,fc||x fc || + <r,Vfc £ [0,7 - 1]. (45) 

From © and © it then holds that: 


IKII < 7g7A,fc||^fe|| +7 g&yk £ [0,7 - 1]. 


(46) 


From Lemma \4^2\ and the condition that N > N > N, it follows that 7>7 g — c 7g , and hence from (l 4 lb it holds that 
7 a ,t < - ^ , V7 > 0. By using this and the fact that yA,t < 7a> we can show that if 7 ^ 0 and x k £ B T , Vfc £ [0,7 — 1], 
then it holds that: 


l|xt||o ° - l-7 s A27A l|rtllo ° 

7 g A^g + /3* 


AS 


1 - 7 9 A^7a 


e t 


(47) 


1 - 7 s A^7a ’ 


From the definition of f t and f in (138b and f c and f in (128b . it follows that f t (ui) < / c (w) and / t (w) > / c (w), Vw £ 
Bxr, V7 > 0. Hence, it follows that: 


/t(w) -/ t (w) < / c (w) -/ c (w),Vw e B-xr. (48) 

From the definition ofx in it then follows that £ Band hence from and (123 b it holds that: 

-7A,t-er + /t(w t + ) <7A,i + cr + / t (w t + ), (49) 

and therefore S+ 7 ^ 0. Repeating this inductive argumentation for all 7 > 0, it follows that 7 ^ 0, V7 > 0. In addition, 

© will hold for all 7 > 0 which implies that the closed loop system is finite gain stable (see e.g. Definition ^. 7b . ■ 

Note that the Theorem l4.1l does not give any relation between the lower limit on the number of the collected initial training 
samples N and the tuning parameters c$, c 7 * and c 7 . These values need to be chosen by the control designer and should 
reflect his feeling of the quality with which the noise bound and the Lipschitz constants are estimated on the basis of the 
training data. 
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